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Abstract 

This paper studies the sensitivity to the ob- 
servations of the block/group Lasso solution 
to an overdetermined linear regression model. 
Such a regularization is known to promote 
sparsity patterns structured as nonoverlap- 
ping groups of coefficients. Our main contri- 
bution provides a local parameterization of 
the solution with respect to the observations. 
As a byproduct, we give an unbiased esti- 
mate of the degrees of freedom of the group 
Lasso. Among other applications of such re- 
sults, one can choose in a principled and ob- 
jective way the regularization parameter of 
the Lasso through model selection criteria. 

1. Introduction 

This paper deals with the overdetermined linear re- 
gression model of the form y — X/3o + £ where y G MP 
is the observation/response vector, /3o £ the re- 
gression vector, X is the design matrix whose columns 
are linearly independent, and e is an additive noise. 
Note that Q > N and XjXj is an invertible matrix. 

1.1. Group Lasso 

A block segmentation B corresponds to a disjoint 



where A > is the so-called regularization parameter 



...,N} and 

^ , for each 



union of the set of indices i.e. IJfces — 
for each 6, &' G S, & n 6' = 0. For /3 e I 
b £ B, Xb — {(ii)i£b is a subvector of /3 whose entries 
are indexed by the block 6, where |6| is the cardinality 
of h. 

We consider the Group Lasso or Block Sparse regular- 
ization introduced by (Bakin, 1999; Yuan & Lin, 2006) 
which reads 



min i||y-X/3p 



and 



is the £^-norm. Note that if each block b is 



of size 1, we recover the standard Lasso (Tibshirani, 
1996). 

1.2. Degrees of Freedom 

We focus in this paper on the variations of the solu- 
tion I3*[y) of V\{y) with respect to the observations y. 
This turns out to be a pivotal ingredient to compute 
the effective degrees of freedom (DOF) usually used 
to quantify the complexity of a statistical modeling 
procedure. 

Let [l{y) = X/3*(?/) be the response or the prediction 
associated to the estimator /?*(?/) of /3o, and let hq — 
X/3o. It is worth noting that fl{y) is always uniquely 
defined, although when f3*{y) is not as is the case of 
rank-deficient or underdetermined design matrix X. 
Note that any estimator /t of /io might be considered. 
We also make the assumption that e is an additive 
white Gaussian noise term e ~ A/'(0, (t^/q), hence y 
follows the law A/'(/io, t^/q) and according to (Efron, 
1986), the DOF is given by 



Q 



cov(yj, [fl{y)] 



i=l 

The well-known Stein's lemma asserts that if fi is 
weakly differentiable then its divergence is an unbi- 
ased estimator of its DOF, i.e. 

df = tT{dfi{y)) and E^{df) = df. 

An unbiased estimator of the DOF provides an unbi- 
ased estimate for the prediction risk of fi.{y) through 
e.g. the Mallow's Cp (Mallows, 1973), the AIC 
(Akaikc, 1973), the SURE (Stein, 1981) or the GOV 
(Golub ct al., 1979). These quantities can serve as 
model selection criteria to assess the accuracy of a can- 
didate model. 
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1.3. Previous Works 



2.1. Local Parameterization 



In the special case of standard Lasso with a linearly 
independent design, (Zou et al., 2007) show that the 
number of nonzero coefficients is an unbiased esti- 
mate for the degrees of freedom. This work is ex- 
tended in (Dossal et al.) to an arbitrary design ma- 
trix. The DOF of the analysis sparse regularization 
(a.k.a. generalized Lasso in statistics) is studied in 
(Tibshirani & Taylor, 2012; Vaiter et al., 2012). A for- 
mula of an estimate of the DOF for the group Lasso 
when the design is orthogonal within each group is con- 
jectured in (Yuan & Lin, 2006). (Kato, 2009) studies 
the DOF of a generalization of the Lasso where now 
the regression coefficients are constrained to a closed 
convex set. He provides an unbiased estimate of the 
DOF for the constrained version of the group Lasso 
under the same orthogonality assumption on X as 
(Yuan & Lin, 2006). An estimate of the DOF for the 
group Lasso is also given in (Solo & Ulfarsson, 2010) 
by an heuristic proof in the full column rank case, but 
its unbiasedness is not proved. 

1.4. Contributions 

This paper proves a general result (Theorem 1) on the 
variations of the solutions to V\{y) with respect to the 
observation/response vector y. With such a result on 
hand. Theorem 2 provides a provably unbiased esti- 
mate of the DOF. These contributions are detailed in 
Sections 2.1 and 2.2 below. The proofs are deferred to 
Section 3 awaiting inspection by the interested reader. 

1.5. Notations 

We start by some notations used in the sequel. We ex- 
tend the notion of support, commonly used in sparsity 
by defining the S-support suppg(/3) of /3 6 as 

suppe(/3) = {6eB\||/?fc||^0}. 

The size of suppg(/3) is defined as |suppg(/3)| = 
SbeB 1^1- ^® denote by X/, where / is a Z5-support, 
the matrix formed by the columns X^ where i is an 
element of 6 £ /. We introduce the following block- 
diagonal operator 

<5^:?;eMl^l^(vfc/||/3fa||)fcgseMl^l. 

and 

where Pg_L is the projector orthogonal to x^. For any 
operator we denote its adjoint. 

2. Main results 

Note that as the X is assumed full column rank, Vx (y) 
has exactly one global minimizer P*{y). Hence, we 
define the single- valued mapping y M- /3*{y). 



Let / be the ;B-support of some vector /3. For any block 
b ^ I, we define 

H/.b={yeM«\3/3:Vce/,(||XTr||,Xjr) = (A,A^)}. 
where r ^ y — X//?. 

Definition 1. The transition space "H is defined as 

^ = U U ^I'b, 

where I is the set of sub- sets o/{0, . . . , A^ — 1} obtained 
as unions of blocks. 

We prove the following sensitivity theorem 

Theorem 1. Let y ^ H, and I = suppg(/3*(i/)) the 
B -support of (5*{y). There exists a neighborhood O of 
y such that 

1. the B-support of /3*{y) is constant on O, i.e. 

yyeO, suppe(r(y))=/, 

2. the mapping j3* is on O and its differential is 
such that 

[dl3*{y)]j.^0 and [9^(^)1/-%), (1) 
where 

d{y) = (XjX^ + XSfs^i^y) o P^*(j,)) ^Xj 

and 

r = {beB\bil}. 

2.2. Degrees of Freedom 

We consider the estimator jl{y) — 'Kl3*{y). 

Theorem 2. Let A > 0. The mapping y ^ A(y) '•^ of 
class on M.^ \ T-L and, 



div(A(y)) = tr(X,%)), 



(2) 



where (3*{y) is the solution of Vx{y) and I = 
suppg(/3*(2/)). Moreover, the set % has zero Lebesgue 
measure, thus if e ^ A/'(0, ct^/q), equation (2) is an 
unbiased estimate of the DOF of the group Lasso. 

We specify this result for the Block Soft Thresholding 
Corollary 1. //X = Id, one has 

\b\-l 



df^\j\-xj2 



bcJ 



\m 



where J ^[j{beB \ \\yb\\ > A}. 
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3. Proofs 

This section details the proofs of our resuhs. We in- 
troduce the following normalization operator 

N{pi) = v where V6G/,«6 = y^. 

\\Pb\\ 

We use the following lemma in our proofs which is 
a straightforward consequence of the first order nec- 
essary and sufficient condition of a minimizer of the 
group Lasso problem V\{y). 

Lemma 1. A vector (3 e is the solution ofVx{y) 
if, and only if, these two conditions holds 

1. On the B-support I — suppg(/3), 

2. For all b £ B such that b ^ I , one has 

\\Xj{y-Xi(3i)\\^\. 

A proof of this lemma can be found in (Bach, 2008). 

3.1. Proof of Theorem 1 

We will use this following lemma. 

Lemma 2. Let /3 e and A > 0. Then XjX^ + 
Xdf} o Pp is invertible. 

Proof. We prove that XjXj -I- XSp o Pp is symmet- 
ric positive definite. Remark that XjXj is a positive 
definite matrix. Moreover, o Pp is symmetric pos- 
itive semi-definite since both 5p and Pg are SDP and 
commute. We conclude using the fact that the sum 
of a symmetric positive definite matrix and a symmet- 
ric positive semi-definite matrix is symmetric positive 
definite. □ 

Let y ^H. We define / = suppg(/3*(j/)) the S-support 
of the solution /3* {y) of Vx (y) . We define the following 
mapping 

T{ai, y) = Xj(X/a/ - y) + \J\f{ai). 
Observe item 1 of Lemma 1 is equivalent to 

r(r(y)]/,2/) = o. 

Our proof is done in three steps. We first (1.) prove 
there exists a mapping y i— > f3{y) such that for every 
element of a neighborhood of y one has T{[/3{y)]i ,y) = 
and [/3(y)]/c = 0. Then, we prove (2.) that I3{y) = 
13* (y) is the solution of 'Px{y) in a neighborhood of y. 
Finally, we obtain (3.) equation (1) from the implicit 
function theorem. 



1. The derivative of T with respect to the first variable 
reads on (RI^I \ U) xRQ 

diTipi,y) = XjXj + XSi3,oPfj,. 

where U = {a e Rl^l \ 3b e I : ab = O}. The map- 
ping diT is invertible according to Lemma 2. Hence, 
using the implicit function theorem, there exists a 
neighborhood O of y such that we can define a map- 
ping /3i : O ^Rl-^l of class over O that satisfies for 

yeO 

r(/37(y),y) = and /3/(y) = ^(y)]/. 

We then extend /3/ on /"^ as /3/c (y) = 0, which defines 
a mapping /3(y) : O ^> R^. 

2. Writting the first-order conditions on (3*{y) on the 
blocks not included in the f^-support, one has 

Wb^I, \\Xj{y-Xi[l3*{y)]i)\\^\. 

Suppose there exists b ^ I such that ||Xj(X/[/3*(y)]/ — 
y)\\ = A. Then y G Hi.b since 

||XM|=A .„d x^ = Ajr|k. 

for r = y — X/[/?*(?/)]/, which is a contradiction with 
y ^T-L- Hence, 

V6^/, \Xj{Xi[P*{y)]i-y)\\<\. 

By continuity oiy ^ Pi{y) and since f3i{y) = [P*{y)]i, 
we can find a neighborhood O included in O such that 
for every y £ O, one has 

V6^/, ||xT(x,/3Ky)-y)IKA. 

Moreover, by definition of the mapping /3/, one has 

Xj{y~Xj(3jiy)) = W{/3i{y)) and supp6(/3/(y)) = /. 

According to Lemma 1, the vector (3{y) is solution of 
Vx{y)- Since Vxiy) admits a unique solution, f3*{y) = 
P{y) for every y £ O. 

3. Using the implicit function theorem, one obtains 
the derivative of [I3{y)]i as 

[dl3*{y)]j = -{d,r{[l3*{y)]j,y))-' o {d^m* {y)]j ,y)) 
where d2T{[l3*{y)]j,y) — —Xj, which leads us to (1). 

3.2. Proof of Theorem 2 

We define for each b £ B 

njj, = |(r,/3) gr'^ X ri^i \ 

WXjrW^l and Vg G /, Xjr=^}. 
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We prove (1.) that for each b £ I, Til ^ is a manifold 
of dimension Q — 1. Then (2.) we prove that H if 
of measme zero with respect to the Lebesgue measure 
on FinaUy (3.), we prove that df is an unbiased 
estimate of the DOF. 

1. Note that U] = V"^({0}) where 

Remark that the adjoint of the differential of -0 













has full rank. Indeed, consider the matrix A = 
(2X(,X5fr|X/). Let a = {s,u)^ G M x RI^I such that 
Aa = 0. Then (2sX6r,it)'r g Ker(X6|X/). Since 
(X{,|X/) has full rank, we conclude that a = 0. As 
a consequence, d'4>{r, /3) is non-degenerated. Finally, 
%[ f, is a manifold of dimension Q — 1. 

2. We prove that T-Li^b is of Hausdorff dimension less 
or equal to Q — 1. Consider the following mapping 



X Ml 



^ (r + X//3,Xjr) 



The mapping is a C^-diffeomorphism between x 
MI^I and itself. Thus, A = 'p['H] ^,) is a manifold of 
dimension Q — \. We now introduce the projection 



A 



y 



Observe that Hj^b = 7r(^). According to Haus- 
dorff measure properties (Rogers, 1998), since tt is 1- 
Lipschitz, the Hausdorff dimension of Tr{A) is less or 
equal to the Hausdorff dimension of A which is the di- 
mension of A as a manifold, namely Q — I. Hence, the 
measure of TLi.b w.r.t the Lebesgue measure of M'^ jg 
zero. 

3. According to Theorem 1, y M- 13* {y) is on 
\ H. Composing by X gives that fi is differen- 
tiable almost everywhere, hence weakly differentiable. 
Moreover, taking the divergence of the differential (1), 
one obtains (2). This formula is verified almost every- 
where, outside the set TL. Stein's Lemma (Stein, 1981) 
gives the unbiased propertiy of our estimator df of the 
DOF. 

3.3. Proof of Corollary 1 

When X = Id, the solution of Vxiy) is a block soft 
thresholding 



r(y)]t 







if llybll A 
otherwise 



(3) 



For every b ^ J, we differentiate equation (3) 

\\yb\\ " 

Since Py± (a) is a projector on space of dimension \b\ — 
1, one has tT{Py±) = |6| — 1. 
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